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(57) Abstract: A face recognition method for working with two or more collections of facial images is provided. A representation 
framework is determined for a first collection of facial images including at least principle component analysis (PCA) features. A rep- 
resentation of said first collection is stored using the representation framework. A modified representation framework is determined 
based on statistical properties of original facial image samples of a second collection of facial images and the stored representation 
of the first collection. The first and second collections are combined without using original facial image samples. A representation 
of the combined image collection (super-collection) is stored using the modified representation framework. A representation of a 
current facial image, determined in terms of the modified representation framework, is compared with one or more representations 
of facial images of the combined collection. Based on the comparing, it is determined which, if any, of the facial images within the 
combined collection matches the current facial image. 
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FACE RECOGNITION WITH COMBINED PCA-BASED DATASETS 

PRIORITY 

5 This application claims the benefit of priority to United States provisional patent 

application no. 60/821,165, filed August 2, 2006, and this application is counterpart to United 
States patent application no. 1 1/833,224, filed August 2, 2007, both of which are hereby 
incorporated by reference. 

1 0 BACKGROUND OF THE INVENTION 

Face tracking technology has been recently introduced into consumer digital cameras 
enabling a new generation of user tools for the analysis and management of image collections 
(see, e.g., http://www.fotonation.com/index.php ?module=company.news&id=39, wherein the 
entire site www.fotoination.com is incorporated by reference. In earlier research, it had been 

15 concluded that it should be practical to employ face similarity measures as a useful tool for 
sorting and managing personal image collections (see, e.g., P. Corcoran, G. Costache, 
Automated sorting of consumer image collections using face and peripheral region image 
classifiers, Consumer Electronics, IEEE Transactions on Volume 51, Issue 3, Aug. 2005 
Page(s):747 - 754; G. Costache, R. Mulryan, E. Steinberg, P. Corcoran, In-camera person- 

20 indexing of digital images Consumer Electronics, 2006, ICCE '06. 2006 Digest of Technical 

Papers, International Conference on 7-11 Jan. 2006 Page(s):2; and P. Corcoran, and G. Costache, 
Automatic System for In-Camera Person Indexing of Digital Image Collections, Conference 
Proceedings, GSPx 2006, Santa Clara, Ca., Oct 2006, which are all hereby incorporated by 
reference). The techniques described in this research rely on the use of a reference image 

25 collection as a training set for PC A based analysis of face regions. 

For example, it has been observed that when images are added to such a collection there 
is no immediate requirement for retraining of the PCA basis vectors and that results remain self 
consistent as long as the number of new images added is not greater than, approximately 20% of 
the number in the original image collection. Conventional wisdom on PCA analysis would 

30 suggest that as the number of new images added to a collection increases to certain percentage 
that it becomes necessary to retrain and obtain a new set of basis vectors. 
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This retraining process is both time consuming and also invalidates any stored PCA- 
based data from images that were previously analyzed. It would be far more efficient if we could 
find a means to transform face region data between different basis sets. 

In addition, it has been suggested that it is possible to combine training data from two or 
more image collections to determine a common set of basis vectors without a need to retrain 
from the original face data. This approach has been developed from use of the "mean face" of an 
image collection to determine the variation between two different image collections. 

The mean face is computed as the average face across the members of an image 
collection. Other faces are measured relative to the mean face. Initially, the mean face was used 
to measure how much an image collection had changed when new images were added to that 
collection. If mean face variation does not exceed more than a small percentage, it can be 
assumed that there is no need to recompute the eigenvectors and to re-project the data into 
another eigenspace. If, however, the variation is significant between the two collections, then the 
basis vectors are instead re-trained, and a new set of fundamental eigenvectors should be 
obtained. For a large image collection, this is both time consuming and inefficient as stored 
eigenface data is lost. It is thus desired to have an alternative approach to a complete retraining 
which is both effective and efficient. 

SUMMARY OF THE INVENTION 
A face recognition method for working with two or more collections of facial images is 
provided. A representation framework is determined for a first collection of facial images 
including at least principle component analysis (PC A) features. A representation of said first 
collection is stored using the representation framework. A modified representation framework is 
determined based on statistical properties of original facial image samples of a second collection 
of facial images and the stored representation of the first collection. The first and second 
collections are combined without using original facial image samples. A representation of the 
combined image collection (super-collection) is stored using the modified representation 
framework. A representation of a current facial image, determined in terms of the modified 
representation framework, is compared with one or more representations of facial images of the 
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combined collection. Based on the comparing, it is determined whether one or more of the facial 
images within the combined collection matches the current facial image. 

The first collection may be updated by combining the first collection with a third 
collection by adding one or more new data samples to the first collection, e.g., according the 
further face recognition method described below. 

Data samples of the first collection may be re-projected into a new eigenspace without 
using original facial images of the first collection. The re-projecting may instead use existing 
PCA features. 

Training data may be combined from the first and second collections. 

The method may be performed without using original data samples of the first collection. 

The first and second collections may contain samples of different dimension. In this 
case, the method may include choosing a standard size of face region for the new collection, and 
re-sizing eigenvectors using interpolation to the standard size. The samples of different 
dimension may be analyzed using different standard sizes of face region. 

The modified representation framework may be generated in accordance with the 
following: 



Cdv c = cf C ' ci ft 0 ' x x E ClT J + cT 02 C2 [E C1 x fC1 x E C2T ] + 



rci x ^ci x £ci J 
A'" I A"' ' ' ' N" N ' 

N C1 N C2 



■ [(Mean C1 - Mean C2 )] x [(Mean Cl - Mean C2 )f 



(n c1 +n C2 J 

A further face recognition method for working with two or more collections of facial 
images is also provided. Different representation frameworks are determined for first and second 
collections of facial images each including at least principle component analysis (PCA) features. 
Different representations of the first and second collections are stored using the different 
representation frameworks. A modified representation framework is determined based on the 
different representations of the first and second collections, respectively. The first and second 
collections are combined without using original facial image samples. A representation of the 
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combined image collection (super-collection) is stored using the modified representation 
framework. A representation of a current facial image, determined in terms of the modified 
representation framework, is compared with one or more representations of facial images of the 
combined collection. Based on the comparing, it is determined whether one or more of the facial 
images within the combined collection matches the current facial image. 

The PCA features may be updated based on first eigenvectors and first sets of 
eigenvalues for each of the first and second collections. The updating may include calculating a 
new set of eigenvectors from the previously calculated first eigenvectors of the first and second 
collections. 

The method may be performed without using original data samples of the first and second 
collections. 

The first and second collections may contain samples of different dimension. A standard 
size of face region may be chosen for the new collection. Eigenvectors may be re-sized using 
interpolation to the standard size. The samples of different dimension may be analyzed using 
different standard sizes of face region. 

The modified representation framework may be generated in accordance with the 
following: 



Cov c = 



C2 



[E cl xV cl xE clT 7 + 




C2 



[(Mean 07 - Mean Ci )]x [(Mean C1 - Mean C2 )J 



+ 



(n ci +n C2 J 



One or more storage media may be provided having embodied therein program code for 
programming one or more processors to perform any of the methods described herein. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 illustrates the application of PC A to a raw dataset. 

Figure 2 illustrates the combining of two different PCA representations of datasets in 
accordance with a preferred embodiment. 

Figure 3 illustrates the combining of PCA representations of datasets with supplemental 
raw data in accordance with a preferred embodiment. 

Figure 4 shows several plots of eigenvalue distributions for both a first and a second 
collection for both the classical PCA method and a method in accordance with a preferred 
embodiment. 

Figures 5a-5d illustrate first eigenfaces from both a first and a second collection for both 
the classical method and a method in accordance with a preferred embodiment. 

DETAILED DESCRIPTION OF THE EMBODIMENTS 
PCA analysis is a very common technique used signal processing for reducing data 
dimensions and in pattern recognition for feature extraction. The main disadvantage of the 
technique is that the resulted PCA features are data dependent which means that they have to be 
re-computed every time the data collection changes its format due to merging or splitting 
between multiple independent datasets or adding/deleting data samples from a given dataset. 
Embodiments of the invention are described below for combining multiple PCA datasets. 
Embodiment are also provided for updating one PCA dataset when data samples are changed 
using the old PCA values and the statistical properties of the PCA space of each dataset without 
using the original data values. This is very useful when the original data values are no longer 
available or when is not practical to re-use them for very large data collections. 

Face recognition tasks are described for two cases. The first case is when it is desired to 
combine two face collections already analyzed using PCA without applying PCA analysis on the 
merged face collection. The second case is when it is desired to add new face samples to one 
collection already trained using PCA. The described methods are shown to yield at least 
similarly effective results as the classical approach of applying the PCA algorithm on the merged 
collection, while involving far less computational resources. 
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The techniques provided herein offer a significant saving in computational effort and are 
quite robust and offer high repeatability in the context of eigenface analysis. The algorithm of re- 
computing the new eigenspace is preferably similar to the approach described in Hall P., 
Marshalland D., and Martin R. "Adding and Subtracting Eigenspaces" British Machine Vision 
5 Conference, Vol 2, 1999, pp:463 — 472; and Hall P., D. Marshall, and R. Martin. "Merging and 
splitting eigenspace models." PAMI, 22(9), 2000, pp:1042 — 1048, which are each hereby 
incorporated by reference. The data samples may be re-projected into the new eigenspace 
without using the original data samples (the original faces), and instead using the old data 
projections (the old principal components). 

10 A formal theoretical explanation of this method is provided below, and it is demonstrated 

that it is be broadly applicable to other applications when PCA based analysis is employed. A 
theoretical basis of principle component analysis as it is applied to eigenface analysis of face 
regions is next described. Preferred and alternative approaches to combining training data from 
multiple, previously-trained, datasets are then discussed. We next present the results of some 

15 experiments with image collections drawn from very different sources and demonstrate the 
practical results of applying our technique. Finally a description of the application of this 
technique in building practical tools and an application framework for the analysis, sorting and 
management of personal image collections is given. 

20 THEORY 

Principal Component Analysis (PCA) is one of the most common approaches used in 
signal processing to reduce the dimensionality of problems where we need to deal with large 
collections of data samples. PCA is a linear transformation that maps the data to a new 
coordinate system so that the greatest variance across the data set comes to lie on the first 

25 coordinate or principal component, the second greatest variance on the second coordinate, and so 
on. These basis vectors represent the eigenvectors of the covariance matrix of the data samples 
and the coefficients for each data sample are the weights, or principal components of that data 
sample. Unlike other linear transformations, such as DCT, PCA does not have a fixed set of basis 
vectors. Its basis vectors depend on the data set. 
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PCA can be used for reducing dimensionality in a dataset while retaining those 
characteristics of the dataset that contribute most to its variance, by keeping lower-order 
principal components and ignoring higher-order ones. One of the advantages of PCA is that we 
need only compare the first 20 principle components out of a possible 1028 components where 
we use face regions of 32x32 pixel size. Calculating the distance between the principal 
components of two given data samples allows a measure of similarity between the two samples 
to be determined. 

The basis vectors for PCA are not fixed, thus when new data samples are added to a data 
set there are issues as to the validity and applicability of the PCA method. In particular, two 
situations which may arise include the following: 

(i) There are two collections of data which are already analyzed using PCA and it is desired to 
determine a common basis vector set for the combined collection and, ideally, transform the 
predetermined PCA descriptors without having to recalculate new descriptor sets for each 
item of data; and 

(ii) There is an existing collection of data which is analyzed using PCA and it is desired to add a 
significant number of new raw data items to this dataset. 

The classical solution for either of these cases is to recalculate the full covariance matrix 
of the new (combined) dataset and then recalculate the eigenvectors for this new covariance 
matrix. Computationally, however, this is not an effective solution, especially when dealing with 
large collections of data. 

A different technique is provided herein to calculate the basis vectors, the eigenvalues 
and to determine the principle components for the new collection. The first eigenvector and the 
first set of eigenvalues are used from each collection in case (i), and preferably only these. The 
first eigenvector and the first set of eigenvalues from the original collection and the new data are 
used in case (ii). These have the advantages that the original data samples need not be saved in 
memory, and instead only the principle components of these samples are stored. Also, the new 
set of eigenvectors are calculated from the eigenvectors originally calculated for each collection 
of data. A detailed mathematical description of these techniques are provided below. 
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PCA MATHEMATICAL MODEL 
To begin, assume a collection of TV data samples Si with i={l,N}. Each data sample is a 
vector of dimension m with m<N. The first step consists in changing the collection of data so 
that it will have a zeros mean, which is done by subtracting the mean of the data samples 
(Mean 5 = I N) from each sample. The PCA algorithm consists in computing the 
covariance matrix Covss using eq . 1 : 



Cov^ = — /S-Mean,yx/S-Mean^y r (1) 



The matrix S is formed by concatenating each data vector Si. The size of the covariance 
matrix is equal to mxm and is independent of the number of data samples N. We can compute the 
eigenvector matrix E=[e } e 2 ... e m ] where each eigenvector e t has the same dimension as the data 
samples m and the eigenvalues [vj v 2 ... v m ] of the covariance matrix using eq 2: 



Co\ ss = ExVxE T ^) 



where the matrix V has all the eigenvalues on the diagonal and zeros in rest. 



\ 0 
0 v, 



0 0 



0 
0 



(3) 



We can reconstruct each data samples using a linear combination of all eigenvectors 
using eq. 4. 

S f = ExP. +Mean s (4) 
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where Pi represents the principal component coefficients for data sample Si and can be computed 
by projecting the data sample on the coordinates given by the eigenvectors using eq 5. 

P t = E T x[S t -Meanly (5) 

By arranging the eigenvalues in descending order we can approximate the covariance 
matrix by keeping only a small number of eigenvectors corresponding to the largest values of the 
eigenvalues. The number of eigenvalues is usually determined by applying a threshold to the 
values, or maintaining the energy of the original data. If we keep the first n<m eigenvalues and 
their corresponding eigenvectors, the approximation of the covariance matrix can be computed 
as: 

Cdv ss = ExVxE T (6) 

with V= [vi v 2 ... v n ] 

The data can be as well approximated using only the first n principal coefficients which 
correspond to the eigenvectors with largest variations inside the data collection. 

S t =ExP.+Mean s ( 7 ) 
The standard results of applying PCA to a raw dataset are illustrated in Fig 1. 

DATASET COMBINATION SCENARIOS 
Let's assume that we have one collection of data C 1 which is analyzed using PCA 
algorithm which means we have its eigenvectors E C1 = [e ci j e ci 2 ... e cl n i]-> the eigenvalues 
[v y v 2 ... v n J 9 the PCA coefficients for each sample in the collection Pc and 
supplementary we also stored the mean data sample in the collection Mean ci . We also can 
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assume at this moment that the original data samples are no longer available for analysis or is not 
practical viable to access them again. This fact is important when working with very large 
datasets where it will be time consuming to re-read all datasamples for applying classical PCA or 
when working with temporary samples that can be deleted after they are first analyzed (e.g. when 
5 working with images for face recognition the user may delete some used to construct PCA based 
models). 

Now we consider two different cases where the data in the collection will be changed: 
(i) either a new collection of data which was already analyzed using PCA has to 
be added to the original collection in order to form a super-collection, or 
10 (ii) a new set of raw (unanalyzed) data samples are to be added to the collection in 

order to update it. 

Lets consider the first case of combination: 



COMBINING TWO COLLECTIONS OF PCA DATA 
15 Let's assume we want to combine collection C 1 described above with another collection 

of data C 2 also PCA analyzed with eigenvectors 2? C2 = [e C2 1 e C2 2 ... e C2 n2 ], eigenvalues 
[v i v 2 ... v n2 ] \ the PCA coefficients Pc and the mean data sample Mean . We want to 
combine the two collections into a super collection C without accessing the original data from 
the two collections S C1 and S C2 (S CI and S C2 are data matrices where each column S c ) 
20 represented a vector data sample). The mean sample in the collection can be computed as: 



Mean = 



N cl *Mean c/ +N C2 *Mean C2 



N cl +N 



-2 (8) 



where TV 67 and N° 2 represent the number of data samples in each collection. It is easy to prove 
[5] that the covariance of the super collection Cove can be computed as: 

25 
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Cov c = N crl N c2 [fc C1 -MeanX^ C2 -Mean)]x [(s cl -Mean^S 02 -Meanf = 
N crl N cA {s C1 -Mea„-)]x[(5 cl -Mean^f + 
+ NCl l N cA (s C2 -Mea„-)]x[(^ C2 -Mean- J + (9) 



+ 



1 N cl N C2 

N Cl + N C2 N Cl + N C2 



[(Mean 0 -Mean C2 )]x [(lMean C7 -Mean°)f = 



N C1 N C2 



Cov^, + — — Cov^ 9 + 



N C1 +N C2 —CI N C1 +N C2 ™C2 



+ 



N C1 N C2 



(n c1 +n C2 J 



[(Mean C7 -Mean C2 )]x [(Mean 0 -Mean 0 )] 



As was stated above we cannot re-compute the covariance matrix using eq. 2 which 
requires all of the original data samples, in our case the original faces. There are two options: 
either we store the complete covariance matrices from the two collections and use them to 
compute the exact values of the covariance matrix of the supercollection or we can approximate 
each covariance matrix using the eigenvalues and eigenvectors from each individual collection, 
viz: 

Cov c = C NC1 C2 [E cl x V cl x E clT J + C2 [E C2 x V C2 x E C2 ' ] + 

c N C1 +N C2 N C1 +N C2 (1Q) 

' -[(Mean 0 - Mean 0 )]x [(Mean 0 -Mean°)f 



(n ci +n C2 J 



If we assumed that from each collection the eigen decomposition (number of 
eigenvectors retained) was done so that the energy of the dataset is conserved we can assume 
(prove later in tests) that the face space given by eigen-decomposition of the covariance matrix 
of the super collection will be close to the estimated face space given by the eigen- 
decomposition of the estimated covariance matrix given by eq. 9. 

In other words we have to show that the estimated eigenvectors and eigenvalues using the 
covariance matrix computed in eq 9 are close to the eigenvectors and eigenvalues computed 
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applying classical PCA on the concatenated supercollection. Not all eigenvectors will be similar, 
only the ones corresponding to the highest variations in the datasets (with the largest 
corresponding eigenvalues). 

The reconstruction of a combined dataset from two independently trained PCA data 
representations is illustrated in Fig 2. 

Another issue that may need to be addressed is the case where the two collections of data 
contain samples of different dimension. For example, in face recognition it might be necessary to 
combine two collections of faces that were analyzed using different standard sizes of face region. 
In this case we should choose a standard size of face region for the super collection (e.g. the 
minimum of the two sizes used), resizing using interpolation the eigenvectors of the combined 
collection to this standard size 

Once we have determined the covariance matrix of the super collection we can use the 
eigen decomposition again and compute the eigenvectors E=[ei e 2 ... e n ] and the eigenvalues 
V=fv] V2 ... v n ] of the super collection. The number of eigenvalues kept for analysis n is 
independent of nj and n 2 . Once we have the eigenvectors we can project the data samples. 
Remember that we don't have the original data samples to project them easily so we have to re- 
create them from the old PCA coefficients. If we want to re-create a data sample we can use eq. 
4. The result represents the data sample from which the mean data sample in collection 1 was 
subtracted so the exact value of the data sample is computed as: 



We have to subtract the mean of the super collection Mean (eq. 8) from this data sample. 
We can re estimate the Pc coefficients for each data sample in the super collection as: 




= £{Cl,C2} x p 



.{C1,C2} 



+ Mean {CAC2} 



(11) 



p c {ci,C2} = £r x £{Cl,C2} x/ > 



{C1,C2( 



+ Mean {C1 ' C2] -Mean 



(12) 
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In this case we assume that we have the collection of data C 1 described and we want to add 
new isF 2 data samples that are not analyzed already. The sample matrix is S C2 and their mean 
value is Mean C2 . The new covariance matrix will be computed as: 



+ TTcFTTTcF [f^ " M«n a )x (s^ - Mean^ j] (13) 



+ 



N cl N C2 
(n ci +N C2 J 



[(Mean C7 -Mean C2 )]x [(lWean C7 -Mean C2 )f 



Applying the same algorithm as described in (1) we compute the new eigenvectors and 
eigenvalues. Using eq. 13 we can update the PCA coefficients of the initial collection and to 
compute the PCA coefficients of the new data samples we use: 

Pcf 2 = E T x[sf 2 -Mean] (14) 

where the Mean matrix can be computed using eq. 8. 

This situation where additional raw data is added to a dataset with a (trained) PCA 
representation is illustrated in Fig 3. 

EXPERIMENTAL RESULTS 
We applied the method described in the previous section for our purpose: face 

recognition. We used an initial test collection of 560 faces (56 individuals each with 10 faces). 

The images are separated randomly into two datasets: the training dataset containing half of the 

faces and the remaining half were used as a testing dataset. 

Two separate tests were performed by splitting the training faces into two collections 

with different number of faces. The faces were randomly attached to one of the initial 

collections: 
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1. Test A: we split the training collection into 2 collections each with 140 faces and 
performed classification tasks using three cases: simple concatenation of PCA 
coefficients without updating, classical combination using data samples and the 
proposed method. 

5 2. Test B: using two collections one with 240 faces and the other with the remaining 40 

faces in two cases: the classical approach and the proposed method. Also for this test 
we used both scenarios: combining two collections of PCA data and adding new data 
to a trained collection. 
All faces were resized to 16x16 pixels, gray scale images were used and the number of 
10 PCA coefficients used in the classification stage was always 20. For classification the nearest 
neighbourhood method was preferred and the Euclidean distance between feature vectors was 
used. 

Fig. 4 represents the variation of the first 50 eigenvalues using the classical approach of 
combining two collections and the proposed method along with the eigenvalue variations inside 
15 the two separate collections. It can be noticed that the variations between the classical approach 
of combining the method and the proposed approach are very similar close to identical in the first 
part of the graph where the classical eigenvalues can not be noticed. 

In order to see how the eigenvectors differ from the classical combination compared with the 
proposed method figure 2 shows the first eigenfaces from each collection along with the first 
20 eigenfaces obtained using the classical combination and the proposed method. It can be noted 
that the representation using the two methods have almost identical distributions. 

It can be noted that the first eigenvector obtained applying the classical PCA over the 
combined collection is similar with the first eigenvector obtained using our approach and both of 
them are really different compared with the first eigenvector from each of the two sub- 
25 collections that we want to combine. 

For Test A our second scenario is unlikely because the collections have the same number of 
samples so we tested only the first scenario: combining the two collections. The results are given 
in the first table. 
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Combining collections 


Simple concatenation 


48.32% 


Classical combination 


84.28% 


Proposed method 


85.23% 



Table 1 . Recognition Rates for Test A 



For Test B we used both scenarios: combining two collections (one having 240 data 
samples and the other having only 40 samples) and adding new sample to one collection. The 
results are given in Table 2. 





Combining collections 


Adding samples to collection 


Simple concatenation 


62.14% 


Na 


Classical combination 


84.28% 


84.28% 


Proposed method 


84.64% 


83.14% 



Table 2. Recognition Rates for Test B 



It can be observed that in both tests the proposed combination method had the results 
very close to the classical approach of completely re-analyzing the newly combined collection 
using PCA. On the other hand, as expected, if the PCA coefficients are not updated after 
combination or the addition of multiple samples to the collection the recognition rate drops 
significantly. For our test case, adding 16% of new images to a collection produced more than a 
20% decline in the recognition rate. 

Techniques have been described for updating the PCA coefficients of data samples when 
the collection of data changes due to adding new data or combining two collections of data 
previously analyzed using PCA into a super collection. An advantage of the techniques is that 
they do not require that the original dataset is preserved in order to update its coefficients. This 
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is very helpful when analyzing large collections of data and mandatory when the original data is 
lost. Another advantage is that the technique is much faster than the classical method or 
recomputing of the PCA coefficients using the original dataset, because the dimension of the 
PCA data is significantly smaller than the dimension of the original data, and one of the 
properties of the PCA algorithm is its dimensionality reduction. For example, if a face image 
raw data sample has 64x64 pixels, about 5k of pixel data is used in storing it in order to keep that 
data sample. Now, if there are 100 faces, then 500k of data space is involved. In a PCA 
representation, probably 20-30 basis vectors are used to describe each face. That means that 100 
images can be stored using about 3000, or 3k of data, which is more than 130 times less than 
storing the raw data samples. So, an advantage of the technique is that the original face images 
are not needed as the technique regenerates them from the PCA representation framework. 

While an exemplary drawings and specific embodiments of the present invention have 
been described and illustrated, it is to be understood that that the scope of the present invention is 
not to be limited to the particular embodiments discussed. Thus, the embodiments shall be 
regarded as illustrative rather than restrictive, and it should be understood that variations may be 
made in those embodiments by workers skilled in the arts without departing from the scope of 
the present invention as set forth in the claims that follow and their structural and functional 
equivalents. 

In addition, in methods that may be performed according to the claims below and/or 
preferred embodiments herein, the operations have been described in selected typographical 
sequences. However, the sequences have been selected and so ordered for typographical 
convenience and are not intended to imply any particular order for performing the operations, 
unless a particular ordering is expressly provided or understood by those skilled in the art as 
being necessary. 

All references cited above, as well as that which is described as background, the 
invention summary, the abstract, the brief description of the drawings and the drawings, are 
hereby incorporated by reference into the detailed description of the preferred embodiments as 
disclosing alternative embodiments. 
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We claim: 

1 . A face recognition method for working with two or more collections of facial images, 
comprising: 

5 (a) determining a representation framework for a first collection of facial images 

including at least principle component analysis (PCA) features; 

(b) storing a representation of said first collection using said representation framework 

(c) determining a modified representation framework based on statistical properties of 
original facial image samples of a second collection of facial images and the stored 

10 representation of the first collection; 

(d) combining the first and second collections without using original facial image 
samples; 

(e) storing a representation of the combined image collection (super-collection) using 
said modified representation framework 

15 (f) comparing a representation of a current facial image, determined in terms of said 

modified representation framework, with one or more representations of facial images of the 
combined collection ; and 

(g) based on the comparing, determining whether one or more of the facial images within 
the combined collection matches the current facial image. 

20 

2. The method of claim 1, further comprising, prior to (b), updating said first collection by 
combining the first collection with a third collection. 

3. The method of claim 2, wherein the updating comprises adding one or more new data samples 
25 to the first collection. 

4. The method of claim 1, further comprising re-projecting data samples of the first collection 
into a new eigenspace without using original facial images of the first collection. 



WO 2008/015586 



PCT/IB2007/003985 



- 18 - 

5. The method of claim 4, wherein the re-projecting uses existing PCA features. 



6. The method of claim 1, further comprising combining training data from said first and second 
collections. 



7. A method according to claim 1 which does not use original data samples of the first 
collection. 



8. The method of claim 1, wherein the first and second collections contain samples of different 
dimension, and the method further comprises: 

(i) choosing a standard size of face region for the new collection, and 

(ii) re-sizing eigenvectors using interpolation to the standard size. 

9. The method of claim 8, wherein said samples of different dimension were analyzed using 
different standard sizes of face region. 



10. The method of claim 1, further comprising generating the modified representation 
framework according to the following: 



Cdv c = c NC1 C2 ^ CI >< V C ' >< E ClT ] + c f 2 C2 [E C2 x V C1 x E C2T J + 
N C1 +N C2 N C1 +N C2 

— — — ^ [(Mean" - Mean C2 )] x [(Mean" - Mean" )f 

(.V n I /V r - j 



11. A face recognition method for working with two or more collections of facial images, 
comprising: 

(a) determining different representation frameworks for first and second collections of 
facial images each including at least principle component analysis (PCA) features; 
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(b) storing different representations of said first and second collections using said 
different representation frameworks; 

(c) determining a modified representation framework based on the different 
representations of the first and second collections, respectively; 

(d) combining the first and second collections without using original facial image 
samples; 

(e) storing a representation of the combined image collection (super-collection) using 
said modified representation framework 

(f) comparing a representation of a current facial image, determined in terms of said 
modified representation framework, with one or more representations of facial images of the 
combined collection; and 

(g) based on the comparing, determining whether one or more of the facial images within 
the combined collection matches the current facial image. 

12. The method of claim 11, further comprising updating the PC A features based on first 
eigenvectors and first sets of eigenvalues for each of the first and second collections. 

13. The method of claim 12, wherein the updating comprises calculating a new set of 
eigenvectors from the previously calculated first eigenvectors of the first and second collections. 

14. A method according to claim 1 1 which does not use original data samples of the first and 
second collections. 

15. The method of claim 11, wherein the first and second collections contain samples of 
different dimension, and the method further comprises: 

(i) choosing a standard size of face region for the new collection, and 

(ii) re-sizing eigenvectors using interpolation to the standard size. 
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16. The method of claim 15, wherein said samples of different dimension were analyzed using 
different standard sizes of face region. 

17. The method of claim 11, further comprising generating the modified representation 
5 framework according to the following: 

Cov c = c NC ' [E cl x V C1 x E clT J + 
N C1 + N C2 



+ N C1 + N' 



^r[(s C2 -Mea„-)x(s C2 -Mean"/] 



CI A rC2 



+ 



(n ci +n C2 J 



[(Mean C7 -Mean°)]x [(Mean C7 -Mean°)f 



10 18. One or more processor readable media having program code embodied therein for 

programming one or more processors to perform a face recognition method for working with two 
or more collections of facial images, wherein the method comprises: 

(a) determining a representation framework for a first collection of facial images 
including at least principle component analysis (PCA) features; 

15 (b) storing a representation of said first collection using said representation framework 

(c) determining a modified representation framework based on statistical properties of 
original facial image samples of a second collection of facial images and the stored 
representation of the first collection; 

(d) combining the first and second collections without using original facial image 
20 samples; 

(e) storing a representation of the combined image collection (super-collection) using 
said modified representation framework 

(f) comparing a representation of a current facial image, determined in terms of said 
modified representation framework, with one or more representations of facial images of the 

25 combined collection ; and 
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(g) based on the comparing, determining whether one or more of the facial images within 
the combined collection matches the current facial image. 

19. The one or more media of claim 18, wherein the method further comprises, prior to (b), 
5 updating said first collection by combining the first collection with a third collection. 

20. The one or more media of claim 19, wherein the updating comprises adding one or more 
new data samples to the first collection. 

10 21 . The one or more media of claim 18, wherein the method further comprises re-projecting data 
samples of the first collection into a new eigenspace without using original facial images of the 
first collection. 

22. The one or more media of claim 21, wherein the re-projecting uses existing PCA features. 

15 

23. The one or more media of claim 18, wherein the method further comprises combining 
training data from said first and second collections. 

24. A one or more media according to claim 18, wherein the method does not use original data 
20 samples of the first collection. 

25. The one or more media of claim 18, wherein the first and second collections contain samples 
of different dimension, and the method further comprises: 

(i) choosing a standard size of face region for the new collection, and 
25 (ii) re-sizing eigenvectors using interpolation to the standard size. 

26. The one or more media of claim 25, wherein said samples of different dimension were 
analyzed using different standard sizes of face region. 
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27. The one or more media of claim 18, wherein the method further comprises generating the 
modified representation framework according to the following: 

Cov c = C NC1 C2 [E cl x V cl x E clT J + C N ^ C2 [E C1 x V C1 x E C2 ' ] + 
N C1 +N C2 N C1 +N C2 



CI Ar C2 



(n ci +n C2 J 



[(Mean c ' - Mean C2 )]x [(Mean C1 -Mean C2 )f 



28. One or more processor readable media having program code embodied therein for 
programming one or more processors to perform a face recognition method for working with two 
or more collections of facial images, wherein the method comprises: 

(a) determining different representation frameworks for first and second collections of 
facial images each including at least principle component analysis (PCA) features; 

(b) storing different representations of said first and second collections using said 
different representation frameworks; 

(c) determining a modified representation framework based on the different 
representations of the first and second collections, respectively; 

(d) combining the first and second collections without using original facial image 
samples; 

(e) storing a representation of the combined image collection (super-collection) using 
said modified representation framework 

(f) comparing a representation of a current facial image, determined in terms of said 
modified representation framework, with one or more representations of facial images of the 
combined collection; and 

(g) based on the comparing, determining whether one or more of the facial images within 
the combined collection matches the current facial image. 



29. The one or more media of claim 28, wherein the method further comprises updating the 
PCA features based on first eigenvectors and first sets of eigenvalues for each of the first and 
second collections. 
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30. The one or more media of claim 29, wherein the updating comprises calculating a new set of 
eigenvectors from the previously calculated first eigenvectors of the first and second collections. 

5 3 1 . A one or more media according to claim 28, wherein the method does not use original data 
samples of the first and second collections. 

32. The one or more media of claim 28, wherein the first and second collections contain samples 
of different dimension, and the method further comprises: 

10 (i) choosing a standard size of face region for the new collection, and 

(ii) re-sizing eigenvectors using interpolation to the standard size. 

33. The one or more media of claim 32, wherein said samples of different dimension were 
analyzed using different standard sizes of face region. 

15 

34. The one or more media of claim 28, wherein the method further comprises generating the 
modified representation framework according to the following: 



COY c = 



N + N* 



■C2 



[E cl xV cl xE clT J + 



+ 



C2 



C2 



Mean^jx^-Mean^f] 




20 
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